Transfer Learning for Constituency-Based Grammars
نویسندگان
چکیده
In this paper, we consider the problem of cross-formalism transfer in parsing. We are interested in parsing constituencybased grammars such as HPSG and CCG using a small amount of data specific for the target formalism, and a large quantity of coarse CFG annotations from the Penn Treebank. While all of the target formalisms share a similar basic syntactic structure with Penn Treebank CFG, they also encode additional constraints and semantic features. To handle this apparent discrepancy, we design a probabilistic model that jointly generates CFG and target formalism parses. The model includes features of both parses, allowing transfer between the formalisms, while preserving parsing efficiency. We evaluate our approach on three constituency-based grammars — CCG, HPSG, and LFG, augmented with the Penn Treebank-1. Our experiments show that across all three formalisms, the target parsers significantly benefit from the coarse annotations.1
منابع مشابه
Exocentric (bahuvrīhi) Compounds in Classical Sanskrit
Constituency grammars originated with Leonard Bloomfield (1933) and were developed during the nineteen forties and nineteen fifties by a number of American structuralist linguists, including Harris (1946) and Wells (1947) — to mention just two. In the late nineteen fifties, Chomsky (1957) suggested that constituency grammars could be formalized as context free grammars. It is now clear that con...
متن کاملJoint Learning of Constituency and Dependency Grammars by Decomposed Cross-Lingual Induction
Cross-lingual induction aims to acquire for one language some linguistic structures resorting to annotations from another language. It works well for simple structured predication problems such as part-of-speech tagging and dependency parsing, but lacks of significant progress for more complicated problems such as constituency parsing and deep semantic parsing, mainly due to the structural non-...
متن کاملExtracting LTAG Grammars from a Spanish Treebank
Treebank grammars have been known to help in building robust, wide-coverage statistical parsers that also obtain state-of-art accuracies. In this work, we present a system that extracts LTAG grammars for Spanish from a constituency-based Spanish treebank. We evaluate the extracted grammar in terms of its size, its coverage on unseen data and the performance of a supertagger trained on it. The s...
متن کاملتبدیل خودکار درختبانک وابستگی فارسی به درختبانک سازهای
There are two major types of treebanks: dependency-based and constituency-based. Both of them have applications in natural language processing and computational linguistics. Several dependency treebanks have been developed for Persian. However, there is no available big size constituency treebank for this language. In this paper, we aim to propose an algorithm for automatic conversion of a depe...
متن کاملOn Relations of Constituency and Dependency Grammars
This paper looks at integrating dependency and constituency into a common framework, using the TAG formalism and a di erent perspective on the meta-level grammar of Dras (1999a) in which the meta level models dependencies and the object level models constituency. This framework gives consistent dependency analyses of raising verbs interacting with bridge verbs, additionally giving a solution to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013